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SUMMARY 


Twenty- two  credit  officers  from  a  major  California  lending 
institution  served  as  subjects  in  a  criterion  validation  of 
multiattribute  utility  elicitation  techniques.  The  techniques 
tested  were  the  Holistic  Orthogonal  Parameter  Estimation  (HOPE) 
technique  (Barron  and  Person,  1978),  Simple  Multiattribute  Rating 
Technique  (SMART:  Edwards,  1977),  point  distribution,  and  three 
rank  weighting  techniques  as  discussed  in  Stillwell  and  Edwards, 
1979.  Equal  weighting  of  importance  dimensions  was  also  investi¬ 
gated.  The  criterion  against  which  the  judgments  were  compared 
was  the  lending  institutions  own  credit  scoring  model.  This 
model  is  based  on  statistical  analysis  of  over  8,000  cases  from 
the  bank  records  and  is  a  "best  fit"  prediction  model. 

Results  demonstrate  that  subjective  judgments  of  importance 
weighting  show  a  high  degree  of  agreement  in  application  selection 
and  in  total  utility  realized  from  that  selection.  Decomposition 
techniques  did  somewhat  better  than  holistic  techniques. 
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Introduction 

Suppose  ‘hat  you  are  a  bank  officer  and  mus-  decide 
whether  credit  should  be  granted  to  a  number  of  applicants. 
wi*h  each  credit  application  you  receive  da*a,  for  example, 
about  the  age  and  previous  credit  record  of  the  applican*. 
You  have  some  rules  of  thumb  about  how  *hese  da* a  relate  to 
overall  creditworthiness,  but  there  is  too  much  information 
to  integrate  intuitively.  How  can  you  evaluate  potential 
candidates  in  this  situation? 

Sulti-attribute  utility*  measurement  ( MAUtt)  is  the  name 
of  a  class  of  models  and  measurement  procedures  developed  to 
aid  decision  makers  in  such  complex  decision  problems.  !»AOH 
evaluates  options  separately  on  each  of  a  list  of  value 
relevant  attributes.  These  single  attribute  evaluations  are 
then  combined  by  a  formal  model,  usually  using  judgmental 
weights. 

In  the  simplest  case  the  weighted  single  attribute 
evaluations  are  added  to  obtain  an  overall  value  of  the 
alternative.  Formally,  this  additive  model  can  be  expressed 


U(X(j)  =  Z  w(i)  u(i)  [x  ( i  j  )  ] 
i=l 


*  The  models  tested  in  this  paper  are,  strictly  speaking, 
7alue  models  or  riskless  utility  models.  We  simply  refer 
to  riskless  utilities  as  utili*ies. 


where  0  (x  (j) )  is  the  total  utility  for  member  (j)  of  option 
set  x,  w  (i)  is  the  weight  for  attribute  i,  and  u(i)  is  the 
single  disens ion  utility  function  transfer sing  the  value  of 
x  on  diaension  i  into  utility  scaling..  The  additive  aodel 
r egg ires  that  attributes  be  preferentially  independent  (see 
Rrantz,  Luce,  Suppes,  and  Tversky,  1971) .  Less  formally, 
this  aeans  that  the  overall  utility  of  an  individual 
attribute  is  independent  of  other  attribute  values. 

k  number  of  set hods  have  been  proposed  for  deter ain in g 
both  the  u(i)  functions  and  the  w(i)  weights.  For  practical 
purposes,  these  set  hods  differ  primarily  in  two  ways: 
strength  of  theoretical  justification  and  ease  of  use. 
Unfortunately,  these  two  dimensions  conflict.  At  one 
extreme  are  the  highly  complex,  theoretically  impeccable 
nethods  discussed  by  Keeney  and  Baiffa  (1976)  or  Dyer  and 
Sarin  (1979) .  Somewhere  in  the  aiddle  are  the  easier  but 
theoretically  aore  problematic  methods  of  Edwards's  SHAHT 
technique  (1977) .  Still  simpler  techniques  can  be  based  on 
ranking  information  (Stillwell  and  Edwards,  1979)  or  even 
equal  weighting  (Dawes  and  Corrigan,  1974) .  These  simple 
techniques  are  defensible  only  as  approximations — but  that 
would  be  a  highly  persuasive  defense  if  they  led  to 
essentially  the  saae  results  as  aore  complex  and  demanding 
methods. 

This  paper  focuses  on  a  comparison  of  weighting 
methods.  Technical  issues  concerned  with  the  u(i)  can  be 
equally  important.  But  measures  of  a  (i)  have  been  less 


controversial,  since  they  are  reasonably  often  simply 
monotone  transformations  on  objective  measures  of  i.  In 
particular,  the  issue  of  central  concern  to  this  paper  is 
whether  or  not  complex  and  sophisticated  methods  of 
eliciting  weights  are  worth  while,  in  two  different  senses . 
Ultimately,  a  weighting  method  would  be  preferable  to 
another  in  spite  of  additional  difficulty  in  its  use  only  if 
it  did  two  things:  changed  the  conclusion  about  what  option 
is  preferable,  or  by  how  much,  and  did  so  in  a  manner  that 
made  the  conclusion  more  nearly  correct. 

Validity  issues  in  HAU 

The  second  of  the  two  criteria  mentioned  above  raises 
the  most  perplexing  problem  of  any  HAU  technique:  validity. 
Values  are  inherently  subjective.  Is  what  sense,  if  any, 
can  one  elicitation  technique  be  said  to  be  more  valid  than 
another? 

A  familiar  decision-analytic  answer  is:  none.  Host 
decision  analysts  apply  the  techniques  as  though  validity, 
at  least  of  utilities,  is  assumed.  Practicing  decision 
analysts,  like  other  practitioners  of  clinical  skills,  must 
depend  on  user  satisfaction  as  an  important  validating 
criterion.  But  if  it  is  the  only  one,  it  is  difficult  to 
see  how  decision  analysts  are  different  from  other  well- 
trained  and  highly  paid  advisers  who  also  give  their  clients 
satis f action. 

Aware  of  the  difficulty,  researchers  have  tried  various 
approaches  to  validating  decision-analytic  tools  and  ideas. 


\  relatively  traditional  approach  has  depended  on  convergent 
validation  (Pollack,.  196ft;  Hnber  et.  al,  1971;  Fischer, 
1971).  This  approach  coapares  overall  utilities  calculated 
from  a  multi-attribute  utility  nodel  (or  statistically 
derived  bootstrapping  nodel)  with  holistic  preference 
responses.  RAUH  utilities  for  each  alternative  are  usually 
conpared  with  holistic  ratings  over  a  set  of  alternatives  or 
sonetines  with  choices  aaong  alternatives. 

Other  variations  of  th*  convergent  approach  conpare 
results  of  different  aodels  and  techniques  with  one  another 
or  even  of  different  subjects  with  one  another  (see  Fischer, 
1977  for  a  sore  complete  discussion  of  convergent  validation 
in  HltJH)  .  Results  of  these  and  other  studies  of  convergent 
validity  have  typically  found  correlations  between 
decomposed  and  holistic  responses  of  .7  to  .9,  with  most  in 
the  high  .80s  to  lom  .90s*  Advocates  of  the  convergent 
approach  suggest  that  these  results  are  "quite  encouraging1* 
(von  Hinterfeldt  and  Fischer,  1975)  .  Shepard  (196ft)  ,  Roepfl 
and  Huber  (1970),  Fd wards  (1971)  and  others  argue  that  HAOH 
procedures  should  not  be  validated  using  holistic  responses 
as  a  criterion.  Holistic  responses  nay  include  substantial 
random  error.  (See  Shepard,  T96ft;  Slovic  and  Lichtenstein, 
1971)  .  Indeed,  as  Slovic,  Fischhoff ,  and  Lichtenstein 
(1977)  point  out,  a  decomposed  judgment  procedure  that  did 
capture  the  random  as  well  as  systematic  components  of 
holistic  preferences  would  be  indefensible  as  an  improvement 
over  the  holistic  procedure.  Holistic  responses  may  also 
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softer  fro*  systematic  bias,  aesponses  say  represent 
simplifying  strategies  of  the  decision  maker.  A  lore 
general  argument  also  applies.  If  the  goal  of  RAO 
procedures  is  to  reproduce  holistic  judgments,  they  are  a 
waste  of  tine,  since  holistic  judgments  are  usually  easier 
to  elicit. 

The  preceding  paragraph  is  encouraging  to  defenders  of 
RAO.  Too  high  a  correlation  with,  holistic  procedures  would 
call  the  complexity  of  RAH  procedures  into  question  as 
unnecessary;  too  low  a  correlation  would  lead  one  to  wonder 
whether  the  RAO  procedures  were  in  fact  capturing  the 
relevant  values.  Correlations  in  the  -7  to  .9  region  are 
just  about  right  for  escaping  both  complaints. 

Various  procedures  can  be  used  to  check  whether  the 
judgments  that  enter  into  a  RAG  elicitation  conform  to 
axioms  of  "reasonable  behavior*.  Keeney  and  Baiffa  (1976) 
spell  out  procedures  for  such  tests,  and  Tver sky  (1967),  von 
iinterfeldt  (1971)  and  Fischer  (1975)  have  studied 
conformity  to  various  axioms  experimentally.  Such  studies 
are  usually  not  relevant  to  validity  as  here  conceived. 

They  test  the  appr opriat eness  of  specific  axioms;  if  those 
axioms  are  inappropriate,  the  practicing  decision  analyst 
would  face  the  viable  options  of  ignoring  the 
inappropriateness  and  treating  the  result  as  a  good 
approximation  (often  an  extremely  useful  and  appropriate 
strategy)  or  of  using  other  elicitation  methods  that  do  not 
depend  on  the  violated  axiom.  Decision  analytic  elicitation 
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procedures  exist  in  bewildering  variety;  failure  of  just 
about  any  axiom  except  the  most  fundamental  ones  (e. g. 
transitivity)  can  be  circumvented. 

While  judgments  that  are  consistent  and  orderly  provide 
theoretical  and  practical  justification  of  the  HAOR  model, 
no  study  of  then  can  provide  empirical  demonstration  of 
RAUB's  ability  to  produce  good  decisions.  A  third  approach 
to  the  validation  problem  therefore  lies  in  finding  an 
external  criterion  of  correctness  against  which  to  validate 
value  judgments.  In  the  first  such  study.  Interna  and 
Torgerson  (1961)  taught  subjects  the  relationship  between 
various  cues  and  an  arbitrary  worth  criterion.  Then,  using 
a  number  of  different  assessment  procedures,  they  elicited 
the  subjects  *  knowledge  of  the  relationships,  since  the 
experimenters  had  a  priori  defined  the  true  relationships 
they  could  directly  compare  the  subjects  judgments  with  the 
results  produced  by  the  defined  aodel. 

The  experimental  procedure  is  essentially  equivalent  to 
the  Bruns wickian  lens  aodel  paradigm  (Brunswick,  1952; 
Hammond,  1966;  Slovic  and  Lichtenstein,  1971)  and  its 
derivative,  the  Hultiple  Cue  Probability  Learning  (HCPL) 
paradiga  (Hammond,  1966;  Slovic  and  Lichtenstein,  1971).  In 
a  HCPL  study  the  subject  is  taught  the  relationship  between 
individual  cues  and  a  criterion  variable.  ?or  example,  a 
subject  could  be  taught  a  hypothetical  relationship  between 
the  size,  weight  and  speed  of  a  football  player  and  his 
overall  ability.  The  relationship  can  and  has  been  taught 
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I 

by  presenting  feedback  about  the  true  aodel  outcoae 
(Schaitt,  1978),  the  true  ratio  of  iaportance  weights 
(Brehaer  and  Qvarnstroa,  1976) ,  and/or  validity  coeffients 
theaselves  (Schaitt,  Coyle,  and  Saari,  1977) . 

•  Although  extensively  used  to  exaaine* subjective  versus 
objective  weighting  technigues  in  the  prediction  context  the 
SC  PL  paradiga  has  only  recently  coae  into  use  as  a  HAOH 
validation  procedure  (John  and  Edvards,  1978b;  John,  Collins 
and  Edwards,  1980) .  In  an  experiaental  task  in  which 
subjects  were  asked  to  evaluate  the  dollar  value  of  diaaonds 
described  on  the  four  characteristics  cut,  color,  carat,  and 
clarity,  subjects  were  taught  an  arbitrarily  defined  value 
aodel »  as  in  the  experiaent  discussed  by  ynteaa  and 
Torgerson  (1961)  ,  various  technigues  were  then  used  to 
elicit  weight  judg seats.  The  aodel  used  to  generate  the 
training  stiauli  was  thus  a  criterion  against  which  to  test 
the  resulting  judgaents. 

The  results  of  this  research  argue  for  the  use  of  HAQH 
technigues.  In  a  recent  review  of  iaportance  weight 
assessaent  research,  John  and  Edvards  (1978a)  conclude; 

"...the  weighting  literature  reviewed, 
and  particularly  the  recent  criterion 
validation  work,  suggests  that  the 
concept  of  attribute  iaportance  is  a 
psychologically  aeaningful  one.  For 
■any  of  the  laboratory  and  field 
settings  studied,  subjects  gave 
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responses  to  direct  subjective 
assessments  of  importance  weights  that 
were  both  consistent  (high  convergent 
validity}  and  accurate  (high  criterion 
validity)." 

Thus,  at  least  in  this  highly  contrived  laboratory 
situation,  subjects  seen  quite  able  to  learn  the 
relationships  of  individual,  cues  to  ootcose  variables  and 
express  these  relationships  in  a  meaningful,  quantitative 
way.  In  addition,  the  work  of  John  and  Edvards  (1978b)  and 
John,  Collins  and  Edvards  (1980)  provides  direct  evidence 
for  subjects'  ability  to  report  what  they  have  learned  using 
standard  H1UH  techniques. 

A  clear  picture  emerges  from  the  theoretical 
investigation  of  weighting  judgment.  Subjects  in  laboratory 
settings  are  able  to  both  learn  weighting  functions  and 
express  them  in  response  to  si SB  elicitation  techniques. 

But  questions  remain  about  whether  decision  makers  in  a  real 
world  setting  perform  equally  well.  In  only  a  fev  cases  has 
a  real  world  criterion  been  used  to  evaluate  the 
decomposition  idea.  Pischet  (1977)  discusses  a  study  by 
Lathrop  and  Peters  (196  9)  based  on  course  evaluations  for 
fourteen  introductory  Psychology  courses.  Students  in  those 
classes  gave  ratings  of  a  number  of  individual  factors  for 
each  course  and  an  overall  course  evaluation  rating.  The 
ratings  were  averaged  and  the  averages  treated  as  objective 
value  measures.  Students  who  were  not  enrolled  in  these 
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courses  either  were  given  the  average  score  of  a  class  on 
each  attribute  and  asked  to  judge  the  average  overall  rating 
(holistic  judgment )  or  were  simply  asked  to  assign  weights 
to  each  of  the  individual  attributes  (decomposed  judgment) . 
This  study  found  that  across  a  nuaber  of  conditions, 
decoaposed  aodels  afforded  better  prediction  than  did  the 
intuitive  judgments  despite  the  fact  that  the  subjective 
weights  were  decidedly  non-optimal  compared  to  weights 
derived  from  multiple  regression. 

A  second  study,  performed  by  Ells  and  John  (1980) , 
again  used  college  students  as  subjects.  Groups  of  college 
student  subjects  mere  to  evaluate  potential  credit 
applicants,  described  on  10  dimensions.  The  criterion  was  a 
statistically  based  credit  model  obtained  from  a  local  bank. 
The  experimenters  found  that  the  SHUT  decomposition 
procedure  (Edwards,  1977)  significantly  improved  the  ability 
of  groups  to  produce  judgments  corresponding  to  the  bank 
model  criteria  over  holistic  judgments. 

The  results  of  both  studies  support  the  deconposition 
approach.  They  are  steps  is  the  right  direction.  But  the 
subjects  were  inexpert,  the  studies  did  not  conpare 
alternative  weight  elicitation  technigues,  and  the  only 
conclusion  to  which  they  can  lead  is  that  one  deconposition 
procedure  works  better  than  an  alternative  based  on  holistic 
judgnents. 

This  study  sets  out  to  renedy  as  nany  of  these  defects 
as  possible.  It  uses  highly  expert  subjects,  performing  a 
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task  that  they  aost  perform  every  work  day,  and  for  which 
they  are  extensi7ely  trained.  It  uses  a  criterion  that  is 
both,  valid  and  realistic,  in  the  sense  that  the  procedures 
used  to  derive  it  ensure  its:  validity,  that  decisions  are 
based  on  it,  and  that  the  subjects  are  extensively  trained 
on  it  and  experienced  in  its  use.  The  entire  decision  task 
is  as  realistic  as  an  experisent  permits;  stinuli  and  issues 
bearing  on  the  decisions  are  the  same  as  in  normal  daily 
decision-making. 

Expertise  and  the  Bank  Bod el  Criterion 

Bost  financial  institutions  use  some  statistical  model 
to  facilitate  credit  granting  decisions  for  high  volume, 
relatively  low  dollar  amount  loans ,  including  decisions  to 
give  credit  cards  to  would-be  users.  Hany  legal  constraints 
limit  the  information  the  lending  institution  may  use. 
within  these  constraints,,  the  credit  scoring  models  use  both 
readily  available  numbers  and  less  structured  inputs  as 
predictors.  For  example  descriptive  attributes  of  credit 
applicants  might  include  age,  sex,  credit  history  or  even 
appearance. 

One  class  of  credit  scoring  aodel  cones  froa  applying 
discriminant  analysis  to  good  and  bad  accounts.  Detailed 
definitions  of  "good"  and  "bad"  vary  from  bank  to  bank;  they 
depend  on  repayment  history.  Discriminant  analysis  finds 
the  linear  prediction  equation  that  maximizes  some 
difference  measure  between  good  and  bad  accounts,  using 
weights  on  the  available  predictors. 


Such  a  discriminant  aodel  was  used  as  the  criterion  in 
this  study.  Its  construct ion  started  with  the  collection  of 
a  saaple  of  4000  good  and  4000  bad  accounts,  stratified  by 
population  and  area.  The  analysis  then  determined  which 
applicant  attributes  best  discriainated  between  the  good  and 
bad  accounts  for  this  sample..  It  osed  the  7  best  predictors 
in  a  percentage-of-variance-accounted-for  sense.  Table  1 
shoes  the  nornalized  weights  for  the  bant  model,  ordered  by 
rank,  in  addition.  Table  1  shoes  the  weight  sets  for  rank 
sai,  rank  reciprocal  and  equal  weights,  weight  approxiaation 
techniques  to  be  discussed  later. 

The  aodel  thus  derived  was  converted  into  an  additive 
point  scoring  systen  for  use  by  the  bank  officers  as  a 
decision  aid.  Bach  level  of  each  attribute  contributes 
points  to  a  sub  representing  the  creditworthiness  of  an 
applicant.  Any  point  sue  can  be  converted  directly  into  a 
probability  of  default  for  that  applicant. 

Bank  officers f  experience  with  this  point  scoring 
system  cones  in  several  forns.  First,  the  officers  are 
given  explicit  aodel  inf or nation.  That  is,  they  are  told 
the  exact  relationship  between  the  attribute  levels  and  the 
overall  credit  score.  In  addition,  they  are  explicitly 
trained  in  the  relationship  between  attribute  levels  and  the 
probability  of  default  as  detarained  froa  the  saaple  data. 

Bank  officers  also  receive  what  is  essentially  outcome 
feedback  froa  direct  use  of  the  aodel.  As  an  application 
cones  to  the  officer,  that  officer  will  first  determine  the 


PAG2  13 


r 


overall  credit  score  for  the  applicant  based  on  information 
presented  on  the  application.  The  officer  then  makes  a 
credit  decision  for  that  applicant.  The  granting  officer's 
name  is  then  appended  to  the  application  and  subsequent 
credit  record.  Tallies  are  kept  over  all  applications 
approved  by  a  given  officer  and  an  ongoing  record  is 
presented  to  that  officer  periodically.  In  addition,  each 
tine  an  account  turns  from  good  to  bad  the  granting  officer 
is  given  the  entire  credit  file  for  review-  Finally,  the 
officers  are  given  a  monthly  report  in  which  the  number  of 
acceptances  and  rejections  axe  broken  down  by  credit  score. 
There  is,  therefore,  some  pressure  on  the  officer  to  avoid 
the  sinple  strategy,  ie.  grant  credit  to  only  those 
applicants  about  whon  there  in  relative-  certainty.  It  is 
interesting  to  note  that  bank,  records  show  that  this  bank 
extends  credit  to  approximately  of  its  applicants. 

The  bank  officers'  experience  with  the  model,  in  each 
of  the  forms  discussed  above,  is  extensive.  During  any 
given  weeks'  work  an  officer  will  make  fron  10  to  over  1000 
credit  decisions  to  which  the  nodel  is  directly  relevant. 

In  addition,  training  in  the  use  of  the  model  and  its 
relation  to  credit-worthiness  and  probability  of  default  is 
initially  extensive  and  continues  throughout  the  career  of 
the  officer. 

The  bank  has  a  cut-off  credit  score  at  or  above  which 
extension  of  credit  is  recommended,  below  which  the  bank 
recommends  that  the  application  be  rejected.  This  cut-off 
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score  flue* jar es  periodically  in  response  *o  *he 


availability  of  money  to  *he  bank  and  *he  bank’s  financial 
condition.  He  most  stress  that  this  score  is  a 
recommendation  only.  The  individual  credit  gran*iny  officer 
has  a  grea*  deal  of  personal  latitude  for  overriding  the 
aodel  recommendation.  Of  course,  the  amount  and  *ype  of 
latitude  is  based  on  the  record,  position,  and  experience  of 
the  officer.  Seme  officers  may  override  the  aodel  with  a 
simple  signa*ure,  others  must  include  an  explanation  while 
still  others  must  convince  another  officer. 

We i ght  Slici*  ation  Procedures 
Sank  Weighting  Procedures 

Three  different  weight  elicitation  procedures  were 
tested  that  use  some  aspect  of  the  rank  ordering  of  value 
dimensions  *o  arrive  at  dimension  weights.  Two  of  *he  t-hree 
require  that  the  subject  provide  only  the  rank  ordering  of 
importance  dimensions  while  the  third  requires  the 
additional  information  of  the  weight  assigned  by  the  subjec* 
to  the  dimension  considers d  most  important.  Each  of  these 
techniques  is  discussed  in  detail  in  Stillwell  and  Edwards 
(1979)  . 

The  first  rank  weighting  procedure,  called  Sank  Sum 
(HS)  weighting  is  arrived  at  via  the  following  formula: 


W  ( i  )  = 


[1  -  R  (i)  +  1] 


R(j)  +1 


WV 


where  W  (i)  is  the  noraalized  weigh*  for  dimension  (i)  ,  v  is 


•■he  number  of  *iaensions,  and  E(i)  is  the  rank  position  of 
diaension  (i) .  This  rank  weighting  procedure  is  common  in 
*he  weighting  literature.  Dimensions  are  simply  given 
weight  equivalent  to  the  noraalized  inverse  ranking  of  * hei 
place  among  other  dimensions.  Por  example,  for  a  three 
diaension  case  *  he  diaension  ranked  first  would  be  given  a 
weight  of  3/ ( 3  +  2  + 1 ) = . 5 . 

Hank  Reciprocal  (SR)  weights  are  derived  from  the 
noraalized  reciprocals  of  the  diaension  rank.  They  are 
defined  by  the  following  formula: 

1/R(i) 

W(i)  =  — - 

I  (1/R  ( j ) 

j  =  l 

where  again  W (i)  is  the  noraalized  weight  for  dimension  (i) 
2  (i)  is  the  rank  of  dimension  (i)  and  n  is  the  number  of 
dimensions.  Por  three  dimensions  the  RH  weight  for  the 
first  diaension  would  be  (1/1)  /  (1/1+ 1/2+ 1/3)  *.  55. 

The  *hird  rank  weighting  procedure.  Rank  Exponent  (RE) 
weights,  requires  one  additional  piece  of  information.  The 
respondent  judges  the  weight  of  the  most  important  attribu* 
on  the  usual  0-1  scale.  other  weights  are  computed  by: 

w(i)  =  ["  -  r  M)  +  nz 

_ 

:  [n  -  R(j)  +  i] 
j  =  i 
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where  z  is  an  exponent;  the  larger  z  is,  the  steeper  the  set 
of  weights  becomes.  z= 1  defines  rank  sna  weights;  z-0 
defines  equal  weights.  The  other  variables  are  the  sane  as 
in  equations  (1)  and  (2).  The  respondent's  judgment  of  9(1) 
permits  solution  of  the  equation  for  Z,  and  given  Z,  the 
rest  of  the  weights  can  be  calculated.  (See  Stillwell  and 
Edvards,  1979  for  details) . 

Instructions  for  the  rank  ordering  procedure  asked 
respondents  to  put  the  attributes  in  order  from  most  to 
least  important  in  determining  credit  score.  The  point  was 
stressed  that  attributes  equal  in  importance  should  be 
indicated.  The  respondents  sere  next  asked  to  consider  only 
the  attribute  they  ranked  first.  They  were  to  provide  the 
proportion  of  the  total  weight  that  they  would  assign  to 
that  attribute. 

Batio  weighting 

Three  weight  elicitation  procedures  result  in  weight 
sets  said  to  have  ratio  properties.  The  first  of  these,  the 
Simple  Hulti-lttribute  Rating  Technique  (SHIRT)  (Edwards, 
1977)  requires  that  the  subjects  first  rank  order  the 
importance  or  value  dimensions,  then  assign  an  arbitrary 
value  of  10  to  the  dimension  ranked  last.  Weights  are  then 
assigned  to  the  other  dimensions  in  ascending  order, 
relative  to  the  anchor  weight  on  the  lowest  dimension, 
maintaining  importance  ratios  between  dimensions.  For 
example,  if  the  respondent  considers  the  most  important 
dimension  15  times  as  important  as  the  least  important  one. 


he  or  she  should  assign  a  weight  of  150.  The  least 
important  dimension  is  then  dlsgarded,  the  second  least 
important  dimension  given  the  value  10,  and  the  ratio 
procedure  repeated,  it  this  point  the  respondent  is  asked 
to  reconcile  any  inconsistencies.  The  SHJLHT  procedure 
followed  the  judgment  of  the  weight  to  the  most  important 
dimension. 

A  second  ratio  weight  elicitation  procedure  examined  is 
called  Holistic  Orthogonal  Parameter  Estimation  (HOPEI  , 
outlined  in  Barron  and  Person  (1979).  Essentially  a 
Bootstrapping  procedure  (Slavic  and  Lichtenstein,  1971; 
Dawes,  1974) ,  HOPE  utilizes  a  fractionalized  Analysis  Of 
Variance  (ANOVA)  design  to  derive  weights  and  location 
measures  for  categorical  or  categorized  continuous 
variables.  Subjects  make  a  number  of  holistic  jndgments  of 
decision  alternatives  determined  by  the  design  requirements. 
These  judgments  are  analyzed  via  the  AHOVA  procedure  whereby 
differences  between  marginal  means  are  used  as  estimates  of 
weights  and  location  measures.  For  the  purposes  of  this 
study,  the  HOPE  procedure  was  constrained  to  an  additive 
model.  In  order  to  conserve  the  respondents’  time  we  were 
forced  to  provide  an  abbreviated  HOPE  design.  All 
applications  shown  respondents  included  a  single  level  of 
the  attribute  that  had  lowest  weight  in  the  bank  model. 

This  attribute  could  therefore  not  be  evaluated  since  it  had 
no  variance.  In  addition,  a  single  level  was  left  out  for 
two  other  attributes.  Even  with  this  shortened  format 
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judgments  of  25  applications  were  required  of  each 
respondent  in  a  fractional  design  (Winer,  1971} . 


The  holistic  judgments  required  by  the  HOPS  procedure 
vere  interspersed  between  each  of  the  other  sets  of 
judgments.  For  each  of  the  HOPE  judgments  respondents  vere 
presented  with  a  single  page  on  which  appeared  the  attribute 
categories  describing  that  application.  &  space  was 
provided  in  which  the  subject  was  to  give  his  or  her 
judgment  of  the  credit  score  for  that  applicant.  We 
stressed  to  the  officers  that  they  were  not  to  simply  add  up 
the  scores  for  the  individual  attributes  but  instead  give  a 
judgment  of  the  overall  credit  score.. 

In  the  final  weight  elicitation  procedure  subjects  vere 
asked  to  distribute  100  points  over  the  value  dimensions  so 
as  to  reflect  their  feeling  about  the  relative  importance  of 
value  dimensions  to  total  value  (Hoff  sanr  1960) .  John  and 
Edwards  (1978a)  suggest  that  this  procedure  leads  subjects 
to  attend  to  the  differences  between  numbers  of  points  given 
a  pair  of  dimensions  rather  than  the  ratios,  although  no 
enpirical  test  of  this  suggestion  has  been  made,  if  it  is  in 
fact  true,  the  resulting  weights  could,  at  best,  be  treated 
as  interval  level  information.  The  point  distribution 
procedure  followed  the  final  set  of  HOPE  judgments. 

Equal  leiqhts 

In  addition  to  the  six  weight  elicitation  techniques 
discussed  above,  equal  weighting  of  importance  dimensions 
was  tested.  Both  experimental  (Dawes  and  Corrigan,  1979) 


aad  theoretical  {Wainer,  1976;  1978}  work  hare  provided 
evidence  and  rationale  for  the  effort  saving  device  of 
siaply  adding  the  normalized  single  dimension  utilities. 


Set  hod 


Subjects 

Subjects  for  the  experiment  were  22  officers  fro*  a 
major  California  banJc.  All  respondents  were  familiar  with 
the  bank  credit  model  used  as.  criterion  for  their  judgments 
and  were  experienced  with  making  credit  decisions  as  part  of 
their  normal  job  routine,  aespondents  ranged  from  3  to  27 
years  (mean  =  10.0}  experience'  with  credit  lending 
institutions  and  from  1  to  27  years  (mean*6.6)  with  their 
current  employer. 

Procedure 

Bach  respondent  was  run  in  a  single  experiment al 
session.  These  sessions  ranged  in  length  from  35  to  95 
minutes.  Bach  respondent  worked  individually  with  an 
exper imeotor .  All  experimenters  had  decision  analytic 
training  and  experience. 

Stimuli 

Bach  respondent  used  a  response  booklet  containing  the 
total  set  of  judgments  required  for  all  elicitation 
techniques.  The  order  of  presentation  of  weight  elicitation 
procedures  and  location  measure  elicitation  was  partially 
determined  by  the  nature  of  the  information  required. 
Location  measure  judgments  were  elicited  before  any  of  the 
weight  elicitations  were  made-  so  that  respondents  were  aware 
of  the  ranges  of  the  relevant  attributes.  Rank  order  weight 
elicitation  judgments  were  made  before  ratio  weight 
elicitations  since  SH1BT  requires  the  rank  order  as  input. 
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The  order  of  presentation  was  those 
-General  instructions 
-Respondent  individual  information 
-Location  measure  judgments 
-Ranking  of  attribute  importance 
-Weight  of  the  most  important  attribute 
-SMART  judgments 
-Point  distribution 

-The  25  holistic  judgments  required  for  HOPE 
mere  interspersed  is  a  random  order  for 
each  respondent,,  between  other  procedures. 

Instructions  stressed  that  me  mere  trying  to  capture 
respondents'  expertise  in  their  judgments.  They  mere  told 
that  they  mould  make  judgments  both  about  individual 
applicants  and  about  descriptive  attributes  of  applicants 
for  credit.  Ve  also  asked  subjects  for  general  background 
information,  (age,  sex,  etc.)  and  specific  information  about 
their  credit  granting  experience  (for  example,  years  with 
this  bank,  number  of  credit  models  mith  which  they  have 
worked) . 

Respondents  were  next  presented  with  a  list  of 
locations  on  or  values  of  each  attribute.  They  were  asked 
to  select  the  worst  value  of  an  attribute,  assign  a  utility 
of  0,  and  then  select  the  best  value  of  the  attribute  and 
assign  it  a  utility  of  100.  Respondents  then  placed  the 
rest  of  the  attribute  values  on  this  0-100  3cale  relative  to 
the  endpoints.  This  procedure  constituted  the  location 
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measure  elicitation.  Pinally,  respondents  aade  weight 
elicitation  judgments  as  discussed  earlier. 

Respondents  completed  the  judgments  necessary  for  each 
procedure  before  going  on  to  the  next.  Respondents  were 
ashed  not  to  refer  bach  to  previous  judgments  or  change  any 
of  those  judgnents.  All  elicitations  were  done 
interact iwely  until  the  experimenter  was  confident  that  the 
subject  understood  the  procedure.  Questions  were  allowed  at 
any  time  during  the  experinental  session  and  subjects  were 
encouraged  to  express  any-  confusion  or  misunderstanding. 

Our  hope  was  to  examine  the  procedures  in  a  fora  as  near  as 
possible  to  that  in  which  they  would  be  found  in  a  real 
world  application  of  that  technique. 


Results 


The  data  analysis  for  this  experiment  is  in  two  parts. 
First,  we  directly  coapared  the  normalized  weight  sets  that 
resulted  froa  respondents'  judgments.  Two  such  comparisons 
were  made.  Table  2  shows  the  true  weight  and  the  mean, 
median,  and  standard  deviation,  across  respondents,  of 
weights  froa  each  attribute  by  each  elicitation  technique. 
Attributes  are  numbered  in  order  of  true  weight.  Looting 
across  attributes  several  things  become  evident.  In  each  of 
the  self-explicated  weighting  techniques,  both  median  and 
mean  responses  shoe  that  respondents  felt  attribute  2  to  be 
aore  important  than  attribute  1-  But  the  weights  derived 
from  the  holistic  judgments  of  BOPS  suggest  that  when 
actually  naJcing  judgments  of  credit  score  respondents 
correctly  idea t if 7  attribute  T  as  aore  important,  a  second 
finding  is  that  SHAH?  and  ranJc  exponent  weighting  result  in 
aore  peaked  weight  sets  as  evidenced  by  the  larger  ratios 
between  the  highest  and  lowest  weighted  attributes.  HOPS 
cannot  be  so  evaluated.  For  BOPS,  the  lowest  weighted 
attribute  was  not  included  in  the  design  and  thus,  this 
ratio  has  no  meaning.  Finally,  the  results  suggest  that 
although  analysis  of  holistic  responses  correctly  identified 
the  most  important  attribute,  the  rest  of  the  attributes  are 
very  close  in  mean  and  median  weight,  on  the  other  hand, 
the  self-explicated  techniques  correctly  produced  weights 
for  the  first  two  attributes  that  are  much  higher  than  for 
the  attributes  ranked  third  thru  seventh. 
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la  order  to  analyze  »ore  closely  the  quality  of  weight 
judgments  a  second  comparison  of  the  weights  resulting  fros 
the  different  elicitation  techniques  was  performed.  Re 
examined  Cumulatire  Frequency  Distributions  (C?D)  of  the 
absolute  values  of  the  differences  between  the  true  weights 
and  those  resulting  from  each  elicitation  procedure  and 
approximation  technique.  This  analysis  was  across  both 
subjects  and  dimensions.  For  this  analysis  we  define 
doainance  in  a  CFD  as:  CFO  A  dominates  CFD  3  if  and  only  if 
for  any  value  of  absolute  difference  the  cumulative 
frequency  for  A  is  greater  than  or  equal  to  the  cumulative 
frequency  for  B. 

Only  a  few  distributions  shorn  dominance  over  the  entire 
range  of  values.  The  difference  distribution  of  rank  sum 
weights  dominates  those  of  raalc  exponent,  SHABT,  and  equal 
weights.  HOP'S  dominates  ramie  exponent  and  equal  weights  and 
point  distribution  dominates  equal  weights,  in  terns  of  the 
average  absolute  deviation  the  ordering  of  techniques  is 
rank  sum  (49.5),  point  distribution  (51.2),  HOPE  (52.4), 
rank  reciprocal  (56.3),  SHABT  (69.9),  Equal  weights  (70.7), 
and  rank  exponent  (79. 6) . 

The  second  part  of  our  analysis  addresses  the  practical 
significance  of  the  differences  found  in  weight  judgments. 

Re  looked  at  the  same  type  of  decisions  the  bank  officers 
make  in  the  performance  of  their  job.  For  this  purpose  we 
used  a  sample  of  200  real  applications  for  credit  at  the 
bank.  These  applications  were  chosen  to  be  representative 


FIGS  26 


r 


I 

! 

* 

I 

I 


◦f  the  general  population  of  applications  that  an  officer  is 
likely  to  see  in  his  or  her  usual  job  performance.  Figure  1 
displays  the  distribution  of  true  utilities  of  these  200 
applications  as  calculated  froe  the  bank  aodel.  It  is 
apparent  that  the  distribution  of  true  utilities  (rescaled 
fron  0  to  100)  is  skewed  slightly  to  the  left.  The  near  of 
the  distribution  is  66.3.  &  value  of  68.1  is  the  decision 

poiAt  equal  to  or  above  which  credit  is  given  as  outlined  by 
bank  rules. 

Substantial  negative  correlations  between  attributes  in 
a  a alt i- attributed  context  can  lead  to  weight  sensitivity 
andr  in  the  presence  of  suboptiaal  weights,  poor  selection 
ordering  (Stillwell  and  Edwards,  1979;  Sewaan,  1977;  sewaan, 
Seaver  and  Edwards,  1976;  ScClelland,  1978) .  Table  3  shows 
the  correlations  between  dieensions  for  the  200  saaple 
applications,  no  correlation  is  meaningfully  negative. 

This  fact  guarantees  that  all  weighting  procedures, 
including  equal  weights,  will  do  reasonably  well,  one 
handicap  of  the  guest  for  realisn  in  stimuli,  criteria,  and 
respondents  is  that  we  aust  taka  the  stimuli  we  can  get,  and 
cannot  design  into  thee  properties  that  would  increase  the 
strength  of  the  experimental  design.  Even  if  we  could  have 
designed  negative  correlation  into  the  applicant  set,  we 
would  have  hesitated  to  do  so.  The  resulting  applicant  set 
would  inevitably  have  seemed  very  strange  indeed  to  the 
respondents. 

In  order  to  compare  elicitation  procedures,  values  of 
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Intardimensional  correlations  : 
200  sample  applications.  ~  ” 
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overall  utility  were  calculated  for  each  of  the  200 
applications  using  the  bank  model  and  each  of  the  veight 
sets  from  the  different  elicitation  procedures  and  location 
measure  sets.  For  each  subject  the  utilities  derived  from 
the  banJc  model  were  then  correlated  vith  those  calculated 
from  each  of  the  veight  elicitation  procedure-location 
measure  combinations.  These  correlations  vere  then  averaged 
across  subjects.  The  results  of  this  analysis  are  shown  in 
Table  a.  For  example,  the  average  correlation,  across  22 
subjects,  between  overall  utilities  calculated  from  the  bank 
model  and  those  from  the  SBAST  veight  elicitation  procedure 
and  judgmental  location  aeasures  is  .88 t. 

The  bank  credit  scoring,  model  led  to  the  selection  of 
98  of  the  200  applications  for  credit.  In  addition  to  the 
correlations.  Table  X  shoes  the  average  number  out  of  those 
98  that  would  have  been  chosen  by  each  of  the  other 
techniques.  For  instance,  using  SOP?  weights  and  HOPE 
location  measures,  an  average  across  subjects  of  77.8  of  the 
correct  98  would  have  been  granted  credit.  Assuming  that  98 
applicants  were  to  be  extended  credit  this  also  means,  of 
course,  that  an  average  of  20.2  applications  would  have  been 
given  credit  by  the  307?  procedure  that  would  not  have  been 
given  credit  by  the  bank  model. 

The  last  column  of  Table  b  shows  the  proportion  of 
total  utility,  as  calculated  by  the  bank  model,  that  would 
have  been  realized  from  selections  resulting  from  each  set- 
location  measure  combination.  Again  this  assumes  that  93 
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re  combination  and  the  bank  model 


EIHWI 


bank  model 


Max  9  8 


M  i  n  (  9  8  ) 


c 


HOPE 

HOPE 

.730* 

77 .3* 

• 

SMART 

Judgmental 

.  881 

35  .  3 

Rank  Sum 

Judgmental 

.934 

37.7 

Rank  Recip. 

Judgmental 

.  387 

33.9 

Rank  Exponent 

Judgmental 

.  860 

82.6 

Dist .  100  Pts  . 

Judgmental 

.  921 

36 . 5 

Equal 

Judgmental 

.  926 

86 . 0 

SMART 

Bank  model 

.923 

38 . 8 

Rank  Sum 

Bank  model 

.  964 

91.2 

Rank  Recip. 

Bank  model 

.927 

88 . 5 

Rank  Exponent 

Bank  model 

.  907 

86 . 3 

Dist.  100  Pts. 

Bank  model 

.  959 

90 . 6 

• 

Equal 

Bank  model 

.  938 

86 . 0 

• 

*0ne  subject  is  not  included  in  this 
responses  to  the  holistic  judgments 
measures  could  be  calculated. 

average 
no  HOPE 

Due  to  inappropriate 
weights  or  location 
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were  to  be  granted  credit.  This  number  is  scaled  such  that 
1.0  is  the  total  utility  of  the  best  98  applications  as 
determined  by  the  bank  aodel  and  0.0  is  the  total  utility  of 
the  lowest  98.  ?or  example,  if  the  decision  maker  had  used 
rank  reciprocal  weights  and  the  bank  nodel  location 
■easores,  the  98  selections,  averaged  across  subjects,  would 
have  resulted  in  97 %  of  the  total  possible  utility  being 
realized. 

The  findings  expressed  in  Table  h  are  relatively 
consistent  across  the  three  analyses  so  we  will  discuss  then 
together.  First,  and  by  far  most  important,  is  the  fact 
that  all  procedures  do  remarkably  well.  Except  for  the  HOPS 
procedure,  all  average  correlations  are  above  .86,  more  than 
82.6  out  of  98  applications  were  selected  correctly  for  each 
weight  3et~locatioit  measure  combination,  and  a  minimum  of 
93.5%  of  the  total  passible  utility  was  realized.  Given 
that  all  techniques  perform-  near  the  maximum,  it  is 
virtually  impossible  to  differentiate  between  then  on  the 
basis  of  aggregate  performance  indices.  Still,  some 
qualified  statements  can  be  made.  There  is  some  indication 
of  sensitivity  to  error  in  location  measure  judgments-  ie 
found  that  approximately  30%  of  dimensions  had  aon- 
monotonicities  for  the  judgmental  location  measures  leading 
to  an  average  drop  in  correlation  from  the  bank  model 
location  measures  of  .035.  A  drop  of  3.25  was  found  in  the 
number  of  applications  correctly  identified  as  worthy  of 
credit  leading  to  a  drop  of  2%  in  the  total  utility 


PAGE  32 


captured.  The  HOPE  procedure  resulted  in  very  good  weight 
judgments  but  suffered  most  fro*  poor  location  aeasure 
placement.  Over  73%  of  HOPE  dimensions  had  non-manotoaic 
category  placement.. 

A  second  interesting  finding  is  the  quality  of  the 
perforaance  of  equal  weighting  of  importance  dimensions.  In 
agreement  with  the  theoretical  findings  of  Vainer  (1976; 

1978)  and  Binhorn  and  Hogarth  (1975)  we  found  that  simple 
equal  weighting  of  importance  dimensions  provided  a 
remarkably  good  approximation,  to  the  weighting  of  the  true 


bank  model. 


Discussion 


Expert  subjects  used  several  well  known  multi-attribute 
utility  weight  elicitation  techniques.  The  purpose  of  this 
experiment  was  to  find  out  how  well  each  of  these  assessaent 
techniques  replicated  the  results  of  a  criterion  aodel 
developed  in  the  environment  of  subjects'  expertise.  Both 
the  normalized  weights  and  the  decisions  produced  by  the 
weights  were  used  for  the  comparison. 

The  use  of  judgmental,  decomposition  methods  to  assess 
multi-attribute  utilities  for  credit  applicants  in  this 
study  led  to  the  same  high  quality  of  decisions  found  in 
previous  studies  ( La t hr op  and  Peters,  1969;  John  and 
Edwards,  1979;  John,  Collins  and  Edvards,  1980) .  Although 
there  seem  to  be  differences  in  the  quality  of  the  weights 
themselves  from  one  technique  to  another,  these  differences 
do  not  pass  along  to  the  resulting  decisions.  There  was 
very  little  difference  between  the  elicitation  procedures  in 
the  qualify  of  these  decisions  and,  in  fact,  simple  equal 
weighting  of  attributes  performed  extremely  well. 

The  results  of  a  holistic,  bootstrapping  procedure  were 
generally  poorer.  These  results  conflict  with  previous 
studies  of  this  technique  (Barron  and  Person,  1979;  John, 
Collins  and  Sdwards,  1980)  as  well  as  more  general  work  on 
holistic  judgment  (see,  for  example,  Fischer,  1977;  Dawes 
and  Corrigan,  1974) .  The  reasons  for  this  poorer 
performance  axe  not  altogether  clear,  but  it  seems  Likely 
that  changes  from  the  experts'  normal  judgment  situation 
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dictated  by  tine  and  technique  constraints  led  to  at  least  a 
part  of  this  decrement.. 

When  making  credit  decisions  in  the  performance  of 
their  duties,  the  experts  generally  make  a  simple 
dichotomous  decision,  ie.  credit-no  credit.  Only  those 
decisions  very  near  the  cut-off  score  require  serious 
consideration  in  this  type  of  judgment.  Those  much  higher 
or  lover  need  only  cursory  examination  before  the  decision 
becomes  obvious.  The  HOPE  procedure  relies  on  judgments 
across  the  range  of  value  on  all  attributes  such  that  many 
of  the  holistic  judgments  required  vere  some  distance  from 
the  cnt-off  scare.  The  experts  are  not  experienced  at  close 
consideration  of  these  judgments  and.  poor  judgments  of  these 
extreme  values  could  account  for  our  results. 

All  procedures  other  than  HOPE  produced  decisions  of 
such  high  quality  that,  so  far  as  these  data  can  guide  us, 
the  appropriate  basis  for  weighting  judgments  is  ease  of 
use.  We  do  not  argue  for  the  generality  of  this  conclusion- 
especially  as  it  night  be  applied  to  negatively  correlated 
values. 

The  major  difference  found  between  the  self-explicated 
weighting  procedures  and  the  holistic  procedure  needs 
further  investigation.  The  difference  nay  be  due  to  the 
task  environment,  knowledge  of  the  model  is  aade  available 
to  the  experts,  knowledge  very  similar  to  that  required  by 
the  decomposition  procedures,  while  their  "holistic 
expertise"  was  limited  to  categorical  judgments  (accept. 


reject) .  Unfortunately,  20  of  the  25  cases  used  to  elicit 
the  holistic  judgments  were  easily  classified  as  "reject *. 
This  say  ha we  severely  affected  the  accuracy  of  the  required 
holistic  rating  judgment. 

Another  reason  for  this  finding  nay  lie  in  the 
attributes  themselves.  Attribute  1,  the  lost  important 
predictor,  includes  historical  information,  while  attribute 
2  is  purely  a  measure  of  immediate  situation.  In  decomposed 
judgments,  the  respondents  may  hate  given  most  weight  to  the 
obviously  important  attribute  that  best  describes  the 
current  state  of  the  applicant,  while  in  holistic  judgments 
they  may  have  assumed  that  relevant  history  incorporates 
situational  inf ormatioa.  (we  regret  that  the  requirement  to 
keep  the  attributes  confidential  precludes  a  more  detailed 
discussion  of  the  point.) 

It  is  important  to  note  the  similarity  of  our  results 
with  those  of  the  HCPL  study  discussed  earlier.  John, 
Collins  and  Edwards  (1980)  found  high  convergence  between  a 
number  of  subject  weight  elicitation  techniques  and  the 
criterion,  as  was  found  in  this  study.  The  implication  for 
future  work  is  obvious.  We  can,  with  confidence,  extend  the 
nc?L  studies  to  investigation  of  real  world  situations  where 
no  criterion  exists. 

Finally,  a  note  of  caution  must  be  introduced.  As 
discussed  earlier,  the  nature  of  the  applications  seen  by 
the  bank  officers,  where  all  attributes  were  positively 
related,  makes  this  an  insensitive  situation  for  ♦■he 
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comparison  of  aulti-attribute  utility  elicitation 
techniques.  We  cannot  be  certain  whether  in  another,  lore 
sensitive  decision  situation  strong  differences  voald  have 
been  found.  In  addition,  we  cannot  estiaate  the  nbigoity  of 
this  insensitive  situation  for  decision  makers.  Our  results 
merely  show  that  in  a  single  real  world  decision  situation 
experts  are  able  to  produce  quality  decisions  using  a  number 
of  decomposition  procedures,  our  findings  do  not  make 
meaningful  sensitivity  analyses  for  important  decision 
problems  unnecessary  or  irrelevant. 
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Twenty-two  credit  officers  from  a  major  California  lending 
institute  served  as  subjects  in  a  criterion  validation  of  multi¬ 
attribute  utility  elicitation  techniques.  The  techniques  tested 
were  the  Holistic  Orthogonal  Parameter  Estimation  (HOPE)  technique 
(Barron  and  Person,  1977) ,  Simple  Multiattribute  Rating  Technique 
(SMART:  Edwards,  1977),  point  distribution,  and  three  rank  weightin 
techniques  as  discussed  in  Stillwell  and  Edwards,  1979.  Equal 
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weicntinc  of  importance  dimensions  was  also  investigated.  The 
criterion  against  which  the  judgments  were  compared  was  the 
lending  institutions  own  credit  scoring  model.  This  model  is 
cased  on  statistical  analysis  of  over  8,000  cases  from  the  bank 
records  and  is  a  "best  fit"  prediction  model. 

Results  demonstrate  that  subjective  judgments  of  importance 
weighting  show  a  high  degree  of  agreement  in  application  selec- 
ion  and  in  total  utility  realized  from  that  selection.  Decom- 
osition  techniques  did  somewhat  better  than  holistic  technique 
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